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Oj, Item response theory (IRT) models have been widely used in edu- 

.^^ ' cational measurement testing. When there are repeated observations 

' available for individuals through time, a dynamic structure for the 

>0 . latent trait of ability needs to be incorporated into the model, to ac- 

commodate changes in ability. Other complications that often arise 
in such settings include a violation of the common assumption that 
n , ' test results are conditionally independent, given ability and item dif- 

^-rfH ' ficulty, and that test item difficulties may be partially specified, but 

^^ subject to uncertainty. Focusing on time series dichotomous response 

data, a new class of state space models, called Dynamic Item Re- 
sponse (DIR) models, is proposed. The models can be applied either 
retrospectively to the full data or on-line, in cases where real-time 
prediction is needed. The models are studied through simulated ex- 
amples and applied to a large collection of reading test data obtained 
from MetaMetrics, Inc. 

T— I ■ 

'sj- 1. Introduction. 

^, 

'nI" . 1.1. Background. Item response theory (IRT) models are frequently used 

in modeling dichotomous data from educational tests, since they allow sep- 
arate assessment of the ability of examinees and effectiveness of the test 
items. A typical one-parameter IRT model is of the form 



C/3 



O 






(1.1) Vx{Xa = i\ei,di) = Y{e,-di 



C^ , where 6i indicates the ability of the ith person; di indicates the difficulty of 

the /th test item; the item response variable Xn could be either or 1, cor- 
responding to whether the lih. test item taken by the ith person is answered 
correctly or not; and the item characteristic curve, F(-), is a cumulative 
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distribution function (c.d.f.) from a continuous distribution. When F(-) is 
the standard logistic c.d.f., the one-parameter IRT model (1.1) becomes the 
famous Rasch model 

(1.2) Y>.[x, = i\e^A) ^"^^'^-"'^ 



1 +exp(6'i -di) 

If F(-) = $(•), where <^(-) is the standard normal c.d.f., then 

(1.3) VT{Xa = l\ei,di) = ^{9i-di) 

defines the one-parameter Normal Ogive or Probit model. We will focus on 
the former model in the paper, for reasons to be discussed later, although 
analysis of the Probit model is actually easier and can be done with a sim- 
plified version of the methodology developed here. 

The development of item response theory from the classical point of view 
owes much to the pioneering work of Lord (1953), Rasche (1961) and their 
colleagues. Among the many noteworthy contributions are Andersen (1970) 
and Bock and Lieberman (1970). 

In classical IRT, it is assumed that the Xn are independent, given the 
person's ability 6i and the difficulty levels d;. This is often referred to as 
the local independence assumption. There are situations in which this as- 
sumption is violated. One such is computer adaptive testing, wherein the 
selection of the next test item typically depends specifically on the previous 
questions and answers. 

The situation is less clear with what is studied herein, MetaMetrics' ed- 
ucational assessment program called Computer Adaptive Instruction and 
Testing (GAIT). With GAIT, a test pool of articles is selected for the stu- 
dent based on an estimate of his/her current ability; the student selects an 
article from this pool and the test questions (described later) are then gener- 
ated before reading commences. Thus, in the environment of the GAIT, the 
possible violation in the local independence would arise from sources such 
as article selection by the student and test questions related to the same 
article so that overall understanding of the article could affect all answers; 
in this paper, such possible effects will be called test effects. Other factors 
that could cause violation of the local independence include health status 
and emotional status of the student on a given day; these will be referred 
to as daily effects. In the MetaMetrics scenario, there had been no previous 
demonstration of the violation of the local independence through the pres- 
ence of test effects or daily effects, and there was a considerable interest in 
establishing such presence for possible enhancement of current models. 

Pioneering papers that addressed the local dependence were Stout (1987, 
1990), who introduced the essential dimensionality and the essential inde- 
pendence of a collection of test items, and Gibbons and Hedeker (1992), who 
considered the conditional dependence within identified subsets of items by 
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allowing random effects in the analysis. More recent work in this direction is 
testlet response theory modeling, proposed by Bradlow, Wainer and Wang 
(1999). They defined the testlet as a subset of items; for example, they de- 
fined a reading comprehensive section in the SAT as the testlet. They then 
modified the classic IRT models by including a random effect term to rep- 
resent the common factor affecting the responses in the testlet. Another 
approach to handle the local dependence is by the introduction of Markov 
structure, such as Jannarone (1986) where the conjunctive IRT kernel was 
introduced. A more recent paper concerned is Andrich and Kreiner (2010), 
where they modified the Rasch model by allowing the conditional probability 
of a response to an item to depend on the answer of a previous item. 

For the modeling in this paper, the random effect approach will be fol- 
lowed. Indeed, two levels of random effects will be introduced to model the 
daily effects and test effects, respectively. 

Another essential generalization of the IRT model lies in their applicabil- 
ity to analyze longitudinal data, that is, to deal with scenarios in which an 
individual is tested repeatedly over time; then, the interest typically centers 
on the growth of an ability of the individual. Embretson (1991) and Mar- 
velde et al. (2006) presented a multidimensional Rasch model to represent 
the change of an ability as an initial ability and one or more modifiabilities. 
Based on the belief that a person's ability growth would be increasing over 
time, Albers et al. (1989), Tan et al. (1999) and Johnson and Raudenbush 
(2006) used linear or polynomial regression of the time variable to mea- 
sure the growth of an ability; their analysis required the same time span 
and testing points for all examinees. Martin and Quinn (2002) modeled the 
transition of a voting preference as a first-order Markov process, where they 
assumed voting preference changes from the previous time point to a new 
point by a random shock; this work did not incorporate a time trend. Park 
(2011) supposed that changes in a voting preference were subject to discrete 
agent-specific regime changes and modeled the indicator of the preference 
regime changes as a first-order Markov process. Bartolucci, Pennoni and 
Vittadini (2011) analyzed test scores in mathematics observed over 3 years 
for public and private middle school students by a multilevel latent Markov 
Rasch model, where they described the dynamic transition of different levels 
of the individual ability also via a first-order Markov process. 

Our approach to the longitudinal issue is based on a new class of dy- 
namic linear models (DLM's) [see West and Harrison (1997) for background 
on DLM's]. The literature on DLM's or state space models, in the frame- 
work considered here of longitudinal binomial data, includes, for example, 
CarUn and Poison (1992), Fahrmeir (1992) and Czado and Song (2008) and 
the last three papers mentioned in the previous paragraph. Our models are 
distinguished from the literature by simultaneously allowing for the follow- 
ing features: (i) observations at variable and irregular time points; (ii) con- 
tinuously changing ability, but with incorporation of knowledge concerning 
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trends (e.g., increasing ability over time) in a nondogmatic way (thus accom- 
modating, say, a drop in reading ability over a summer vacation); (iii) an 
analysis that is either individual or hierarchical across a group of individuals, 
the latter allowing for "borrowing strength" in estimates of certain overall 
parameters; (iv) either a retrospective analysis based on the full data or a 
real-time analysis and prediction for an individual based on the data to date. 
Moreover, we consider the case in which the test item difficulties are 
nominally specified, as in CAIT, where the test items are often computer- 
generated and have theoretically determined difficulties. The actual item 
difficulties are quite uncertain, however, this uncertainty is also accommo- 
dated in our analysis. Previous papers that introduced random effects for 
item parameters include Sinharay, Johnson and Williamson (2003) and De- 
Boeck (2008). 

1.2. Testbed application. The model developed in this paper is motivated 
by CAIT testing, as developed by MetaMetrics Inc. The main applied goals 
are as follows: 

• The original goal is to assess the appropriateness of the local independence 
assumption for this type of data. This evolves into the goal of better 
understanding the nature of the daily and test effects. 

• A second goal is to understand the growth in ability of students, by ret- 
rospectively producing the estimated growth trajectories of their latent 
abilities in the study. 

• A third goal is to enable on-line prediction of one's ability (based solely on 
data obtained up to that point), to enable a better assignment of reading 
materials to match his/her ability and to enable teachers to better assist 
students. 

The data considered is from a school district in Mississippi and consisted 
of 1983 students who registered over two years in a CAIT reading test pro- 
gram conducted by MetaMetrics Inc. The students were in different grades 
and entered and left the program at different times between 2007 and 2009. 
Individuals took tests on different days and had different time lapses be- 
tween tests. Because of the long periods of testing, a fully adaptive model 
accommodating continual changes in ability is needed. 

The data was generated during sessions in which a student read an article 
selected from a large bank of available articles. The articles in this bank had 
been assigned text complexity measured in Lexiles, using the Lexile Recep- 
tive Analyzer ®, a software developed by MetaMetrics Inc. to evaluate the 
semantic and syntactic complexity of a text. The Lexile measure represents 
either an individual's reading ability or the complexity of a text. The scale 
for Lexiles ranges from to 1800, with indicating no reading ability and 
1800 being the maximum. 
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A session begins like this: a student selects from a generated list of articles 
having Lexile complexities in a range targeted to the current estimate of the 
student's ability. For the selected article, a subset of words from the article 
are eligible to be dozed, that is, removed and replaced by a blank. The 
computer, following a prescribed protocol, randomly selects a sample of the 
eligible words to be clozed and presents the article to the student with these 
words clozed. When a blank is encountered while reading the article, the 
student clicks it and then the true removed word along with three incorrect 
options called foils is presented. As with the target word, the foils are selected 
randomly according to a prescribed protocol. The student selects a word to 
fill in the blank from the four choices and an immediate feedback is provided 
in the form of the correct answer. 

The dichotomous items produced by this procedure are called "Auto- 
Generated-Cloze" items. They are single-use items generated at the time of 
an encounter between a student and an article. If another student selects that 
same article to read, a new set of target words and foils is selected. Although 
it is not strictly impossible for an individual item to be taken by more than 
one student, such an occurrence is highly improbable. As a consequence, it is 
not feasible to obtain data-based estimates of item calibration parameters. 

Instead, the difficulties of the items generated for an encounter between a 
student and an article can be modeled as a sample from an ensemble of item 
difficulties associated with the article. The text complexity in Lexiles pro- 
vides a theoretical value for the ensemble mean. An estimated student ability 
in combination with assumptions about the ensemble allows calculation of 
a predicted success rate for the encounter. A comparison of the observed 
success rate with predicted, aggregated over many encounters, provides a 
basis for assessing the viability of the assumptions incorporated into the 
model. The predicted success rates in Table 1 in Stenner (2010) include the 
assumption that the mean of the ensemble of item difficulties for an article 
is given by its theoretical text complexity. The agreement with observed 
success rates supports that assumption. 

Although MetaMetrics data is typically presented in Lexile units, there 
is a simple linear transformation from Lexiles to logit units. We will utilize 
the more common logit units for all data and results in this paper. Note 
that this also motivates the use of the logistic IRT model in this paper — to 
preserve compatibility with the MetaMetrics data. 

1.3. Preview. Because of the complexity of the model considered (and 
of the testbed data set), as well as the need to incorporate prior information 
into the model, the analysis will be carried out using Bayesian methodology 
and Markov chain Monte Carlo (MCMC) computational techniques. A side 
benefit of using these methodologies is that all uncertainties in all quantities 
are combined in the overall assessment of inferential uncertainty. The MCMC 
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procedure utilizes a novel combination of Gibbs sampling together with a 
block sampling scheme involving forward filtering and backward sampling. 
In Section 2 we formally describe the proposed models to capture the 
dynamic changes in a person's ability as well as the local dependence be- 
tween item responses. Section 3 presents the MCMC strategy to carry out 
the statistical inference. Section 4 tests the methodology on some simulated 
examples (where the truth is known). Section 5 applies the proposed models 
to the MetaMetrics data set. Section 6 draws conclusions from both sta- 
tistical and psychological sides, and points out some directions for future 
studies. 

2. Dynamic item response (DIR) models. This section formally intro- 
duces the proposed one-parameter DIR model. Although the focus is on 
generalizing one-parameter IRT models, it would be straightforward to sim- 
ilarly generalize two-parameter or three-parameter IRT models. 

2.1. The observation equation in DIR models. In a typical one-parameter 
IRT model (1.1), the index of the item response Xu indicates the correctness 
of the ith person's answer to the /th question in a single test. Consider the 
more involved situation in which the individual completes a series of tests 
within a given day and over different days. Thus, the item response variable 
is Xi^t^g^i, which corresponds to the correctness of the answer of the /th item 
in the sth test on the tth day taken by the ith person. Here, i = 1, . . . ,n; 
t = l,...,rj; s = l,...,5i_i; and / = I,. . . ,Ki^t,s- 

Likewise, let di^t,s,i represent the difficulty level of the Ith. item in the sth 
test at the tth day taken by the ith. person. As described in the Introduction, 
we model the test difficulties as being nominally specified, but with uncer- 
tainty. Thus, we write 

(2-1) di^t,s,l = (^i,t,s + £i,t,s,h 

where ai^t,s indicates the ensemble mean difficulty for the items in the sth 
test taken by the ith person on the ith day, and £i^t,s,i is the random deviation 
from this ensemble mean difficulty for the Ith item within the sth test. In 
the scenario we consider, the value of ai^t,s is assumed to be known, from 
the theoretical analysis of text complexity, while it is assumed that £i^t,s,i 
is a normal distribution with zero mean and specified variance o"^ from the 
test design in the CAIT testing, which is denoted as ei^t,s,i ~AA(0,o"^). 

As mentioned in the Introduction, we will also incorporate a term of 
daily random effects, c/jj^t, as well as a term of test random effects, r]i^t,s, 
to account for the possible local dependence factors when person i takes 
several tests during day t. It is assumed that ipi^t ~ AA(0,(5~ ) and, letting 
ili,t = {Vi,t,i-,- ■ ■ ■,fli,t,Sity denote the vector of test random effects on day t 
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for individual i, that rji^t^ J^Siti^^'i'i' I|Ss=i^J,t,s = 0)) with differing and 
unknown precision parameters 5i and Tj for each individual i. Here I is an 
Si^t X Si^t identity matrix. The multivariate normal distribution for r/j^t is ac- 
tually a singular multivariate normal distribution because it is conditioned 
on the sum of the day's test effects being zero, done to remove any possibility 
of confounding with the daily random effects. (In analysis and computation, 
this singular multivariate normal distribution is replaced by the correspond- 
ing lower-dimensional nonsingular multivariate normal distribution.) 

Finally, at the observation level, the dichotomous test data is modeled as 

^^{^i,t,s,l = M(^i,t, 0-1,1,3, ^i,t,Vi,t,s,£i,t,s,l) 

= F(6'i,j - di^t,s,i + ^i,t + rii,t,s) 

where 6i^t represents the ith. person's ability on day t; we are thus assuming 
that a person's ability is constant over a given day, although there could 
be random fluctuations captured by the (pi^t and 7/i,j,s- Letting F(-) be the 
logistic c.d.f., as previously discussed, results in 

P^{^i,t,s,l = M(^i,t,0,it s,^i,t,'']i,t,s,£i,t,s,l) 

(2.2) 

_ exp(6'i,t - ai^t,s + ^i,t + r]i,t,s + Si,t,s,i) 



1 + exp(6'i,t - ai^t,s + fi,t + m,t,s + £i,t,s,i) ' 

2.2. The system equation in DIR models. As mentioned in the Introduc- 
tion, both parametric growth models and Markov chain models have been 
utilized in contexts similar to that of this paper. Here we combine these ideas, 
through a generalization of dynamic linear models, to model an individual's 
ability growth trajectory over time. The proposed model is 



(2.3) 0,,j = Oi^t-l + c^{l - pei^t~i)^tt + w. 



i,ti 



which has three terms, modeling how current ability, Oi^t for the ith. person 
on the tth day, relates to past ability and other factors. The first term is 
simply ability at the previous time point, Oi^t-i- 

The second term is a parametric growth model. Here Cj can be thought 
of as the average growth rate of the ith person's ability over time and A^^ 
is the time lapse between the person's tth test day and (t — l)th test day 
but truncated by a pre-specified maximum time interval A^-^^^, that is, 
A^j = minjAj^j, At,^^^}; thus, CjA^^ would reflect the ability growth over 
the given time interval if the growth was indeed linear. However, this growth 
is truncated at A^^^^^^ (chosen herein to be 14 days), reflecting the fact that, 
when on vacation, the student's ability may not be growing. Furthermore, 
the growth rate often declines as ability increases (indeed ability typically 
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eventually plateaus), so that a linear growth model is often unsuitable when 
9i^t becomes large. The "correction factor," —pOi^t-i in (2-3), compensates 
for this effect, slowing down the linear growth as the ability level becomes 
larger, p is the parameter controlling the rate of this adjustment, and could 
be known or unknown. In our testbed example, p is known, based on ex- 
periments conducted at MetaMetrics [Hanlon et al. (2010)]. In principle, p 
should be individual-specific, but it is distinguishable from Cj only as the 
individual's ability level is reaching maturation; our investigation of ability 
growth in the testbed data focuses on early age students, so only the Cj are 
made individual-specific. 

As in all dynamic linear models, the third term, Wi^t in (2-3), represents 
the random component of the change in the ith person's ability on the tth 
day. We assume it is AA(0,(/)~"^Aj^t), where (f) is unknown. Note that this 
presumes that the random component of a person's ability change has the 
variance proportional to the time period between test days. Note, also, that 
we suppose that (j) is common across individuals. The reason for this is 
clear from (2.2), in which (/?j^t ~ AA(0, (5~ ) have individual-specific Si] there 
would be a substantial risk of confounding in the likelihood between Jj's and 
(/)~^Aj^t if the time lapse between tests for the student were equally spaced. 

It is possible to rewrite (2.3) as a first-order Markov process, and this 
is beneficial for computational reasons. Indeed, letting Aj^t = 9i^t — P~^ and 
Qi^t = 1 — CiP^tv ^^^ system equation (2.3) becomes 

(2.4) Ai,t = gi,Ai,t~i + Wi^t, 

where Wi^t ~ A/'(O,0~^Aj^4), and this is in the form of a standard dynamic 
linear model. (Note that Cj and (j) need to be known for this reduction.) 

2.3. DIR model summary. To sum up, the one-parameter DIR model is 
constructed in two levels as follows: 

System equation: 6i^t = di,t~i + Ci{l - pOi^t~i)Af^ + w^, 

Observation equation: Fr{Xi^t,s,l = Mdi,t, ai^t,s, ^i,t, rii,t,s, £i,t,s,l) 

_ exp{9i^t - ai^t,s + ^i,t + m,t,s + i^i,t,s,i) 



1 + exp(6'i,t - ai^t,s + Vi,t + ili,t,s + £i,t,s,i) ' 

where Wi^t r^ M{0,(t>~^Ai^t), £i,t,s,l ^ J^{0,cr^), 'fi,t ^ J^{0, 6-^), rii^t ^ J^S,,tiO, 
'^r^I|I]s=i^M,s = 0), and A+ =min{Aj,t,AT^^^}, with the ai^t,s, P, \,t, 
^Tmax and a being known and Oi^t, Ci, (j), 5i and Tj being unknown. 

3. Statistical inference for DIR models. In this section the Bayesian 
methods that will be used for statistical inference in DIR models are de- 
scribed. Computation is based on a Gibbs sampling scheme, in conjunction 
with forward filtering and backward sampling. 
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3.1. Prior distributions for the unknown parameters. Prior distributions 
in a Bayesian analysis must be specified carefully, but they can be either 
evidence-based priors, reflecting scientific knowledge of the system under 
study, or they can be objective priors, reflecting a lack of such knowledge but 
possessing good overall properties — for example, good frequentist properties 
[see, e.g., Berger (2006)]; a mix of both will be used in the analysis herein. 
Specification of evidence-based priors is, of course, context dependent and, 
here, will be done within the context of the MetaMetrics testbed application. 

A natural choice of the prior distribution for an individual's initial latent 
ability, 6'j,o, is 

where hg . and Vq . are the mean and the variance, on a logit scale, of the 
population (j) to which the individual i belongs — for instance, the individ- 
ual's grade in school for the testbed application. For the average growth rate 
Ci in system equation (2.3), the natural objective prior is a constant prior 
(since Cj is a linear parameter), but we constrain q to be positive, reflecting 
the belief that there is a positive learning rate; thus, we choose the prior 

7r(cj) oc /(cj > 0) for all i. 

Although (j) is a scale parameter, it occurs at the system-level of the two- 
stage model and, hence, the usual scale objective prior {l/(j)) would result 
in an improper posterior; the computationally simplest adjustment is to use 
7r(0) = l/(j)^''^, which does result in a proper posterior. Similarly, for the 

scale parameters 6i and Tj we utilize the objective priors 7r(5i) = 1/(5/ and 

3/2 

7r(rj) = I/tj . A natural alternative would be to try to "borrow informa- 
tion" across individuals, by utilizing gamma hyperpriors for the Jj's and tj's. 
This complicates the computation, however, and does not seem necessary 
for the testbed application. 

3.2. Posterior distribution. To facilitate the use of Gibbs sampling tech- 
niques in computation, we utilize a mixture of normals representation of the 
logistic distribution. From Andrews and Mallow (1974), if Y has a logistic 

distribution with location parameter and scale tt^/3 (£(0, ^)), one can 
write the density as 



(3.1) f{y) 



(i + e-yy 



1 1 f ify 



■exp 



V2^2i/ "^l 2\2u 



7r(i/) du, 



where f has the Kolmogorov-Smirnov (K~S) density 

oo 

(3.2) 7r(z.) = 8^(-l)("+^)a2j,g^p|_2„2^2|^ ^ > q^ 

Note that the density in square brackets in (3.1) is A/'(0,4z/^). By using 
the idea of data augmentation from Tanner and Wong (1987), we con- 
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sider the latent variable Yi^t,s,i for each response variable Xi^t,s,h where 
Yi^t,s,l ~ ^fi0^,t - ai,t,s + ^i,t + ?/j,t,s + ^i,t,s,lAt^lt,s,l) ^^^ define Xi^t,s,l = 1 if 
Yi^t,s,l > and Xi^t^s^i = otherwise. It is then easy to show that Fy:{Xi^t^s,l = 
MOi,t,ai^t,s,Vi,t,'ni,t,s'>^i,t,s,i) = exp(0j,t - ai^t,s + fi,t + Vi^s + £i,t,s,i)/i'^ + 
exp{6i^t — CLi^t,s + ^i,t + Vi,t,s + £i,t,s,i)), SO that the introduction of the la- 
tent variables Yi^t,s,i will not alter the model (except that there are now 
formally many more unknown parameters). 

As ei^t,s,i ~ -^(0,(7^), it can be marginalized out in the distribution of 
Yi^t,s,i, resulting in Yi^t,s,i ^ J^iOi,t - ai,t,s + ^i,t + 'ni,t,sA'^lt,s,i + c^^)- There- 
fore, the one-parameter DIR models (2.2) and (2.3) can be rewritten, with 
latent variables {li,t,s,z}, as 

(3.3) Bi^t = Oi,t-i + q(1 - p^i,t_i)A+ + Wi^t, 

(3.4) Yi^t,s,l = Oi,t - «i,t,s + <fi,t + ViAs + Ci,t,s,h 

(3.5) i^i,t,s,i ~ K-S distribution, 

where «;i,t ~AA(0,,^-iA,,t), ipi^t -- M{0,5-^), r?i,t ~ AA5,,,(0,rril| ^i'l^M.s = 
0), and Cm,s,« --^(0, V'-i,;) with V.^,; = ^i^lt,s,i + ^'• 

Define = {9i,. . . ,en)' , where 6i = {6i^o,di,i, ■ ■ ■ ,()i,Tj for i = l,.--,n; 
c= (ci,...,c,„)' and r = (Ti,...,r„)'; Y = {Yi^t,s,l}, ^ = {'^i,t,s,l} and X = 
{^i,t,s,l} for l = l,...,Ki^t,s, s = l,...,5i,j, i= l,...,Ti and i = l, . . . ,n; ip = 
{ifi^t} ioTt = l,...,Ti,i = l,...,n; rj = {r]i^t,s} for s = 1, . . . ,S'i,t, t=l,...,Ti 
and i = l, . . . ,n and t]*^ = (f?i,t,i, . . . ,rii^t,s,t-i)' ■ Then the joint posterior den- 
sity of 0, y, c, r, yj, 7/, i^ and (/> given the data X, in the one-parameter DIR 
model, is proportional to 

7r(6',y,c,r,(/3,r/,z^,(/)|X) 

nvr(0i,o)vr(Q)^(<5.)7r(Ti) 7r(</)) [11111 11 ^Km-^) 

i=l J U=lt=ls=l 1=1 ) 

{n Ti Si,t Ki^t.a 
nnn n u{nt,s,^>o}/{x,,,,,,,=i} 
i=lt=ls=l 1=1 

+ I{Yi,t,s,l<^]I{X^,t,s,l=^]) 



(3.6) ^ '-^ 

/ '^i,t,s,i(Yi,t,s,i - Oi,t + ai^t,s - ^i,t - 'ni,t,sf 
xexpl 
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' " ^' / ^. \ iSi,t-l)/2 



riritt^Zlvlt' 




i,i_i - Cj(l - p6'i,t_i)A+ } 



2A 



i.t 




where 



s; 



/2 1 

1 2 



i,t 



1\ 

1 



V^ ^ ■■■ ^/ (5»,t-l)x(S,:,t-l) 

and I{Z £ A) is the mdicator function equal to 1 if the random variable 
Z is contained in the set A; n^Oifi), vr(ci), 7r((5j), ir^Ti), 7r{4>) are the priors 
specified in the previous subsection, and tt {1^1^1,3,1) is the K-S density defined 
at the beginning of this subsection. This is a proper posterior under very 
mild conditions; see Appendix C. 

3.3. Computation. Computation is done by a MCMC scheme that sam- 
ples from the posterior (3.6) via a block Gibbs sampling scheme, utilizing 
the forward filtering and backward sampling algorithm at a key point. The 
steps of the algorithm are given in Appendix A. 

From the MCMC samples, statistical inferences are straightforward. For 
example, an estimate and 95% credible interval for the latent ability trait 
9i^t can be formed from the median, 2.5%, and 97.5% empirical quantiles of 
the corresponding MCMC realizations. In examples, these will be graphed 
as a function of t so that the adaptive nature of the model is apparent. 

4. Simulated examples. In this section a simulated example is used to 
illustrate the inferences from the proposed one-parameter DIR models and 
to study their properties, primarily from a frequentist perspective. 

The simulation examines the model's behavior for multiple individuals 
taking a series of tests that are scheduled during different time periods. In 
particular, suppose there are 10 individuals and each individual has taken 
tests on 50 different days. Thus, n = 10 and Tj = 50, for i = 1, . . . , 10. During 
each distinctive test day, the individual takes four tests; thus, Si^t = 4 for 
t = 1, . . . , 50, i = 1, . . . , 10. Each test consists of 10 items, so that Ki^t,s = 10 
for s = 1, . . . , 4, t = 1, . . . , 50 and i = 1, . . . , 10. For the ith person, the time 
lapse between two different tests is assumed to be a function of the tth day. 
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that is, Ai^t = 10 + t, for i = 1, . . . , 10, t = 1, . . .,Ti/2 and A^^^ = t - 10, for 
t = Ti/2, . . . ,Ti. Finally, the unknown values of parameters in the models are 
chosen as follows: 

• (f)= 1/0.0218^, and the corresponding standard deviation of the random 
component Wi^t in system equation (2.3) is 0.0218iyAj^i. 

• c = (0.0055,0.0065,0.0026,0.0037,0.0061,0.0047,0.0035,0.0043,0.0039, 
0.0015)', where each element in the vector c corresponds to the zth per- 
son's average growth rate, respectively, for i = 1, . . . , 10. 

• 5 = (2.0408,1.3333,1.8182,1.2346,1.5873,1,2.2222,1.0526,1.1494,2)', 
where each element in the vector 5 corresponds to the precision parameter 
of daily random effects for the ith person, respectively, i = 1, . . . , 10. 

• r = (4,3.1250,4.3478,2.7027,3.7037,2.8571,4,2.2222,9.0909,4.5455)', 
where each element in the vector r corresponds to the precision parameter 
of test random effects for the ith person, respectively, i = 1, . . . , 10. 

According to the observation equation (2.2), we then simulated values for 
the unknown variables and set the test difficulties, ai^t,s-, to be Oi^t + C) where 
(^ is a random variable with uniform distribution on (—0.1,0.1). The values 
of ej,f,s,i were drawn from AA(0, 0.7333^) and the value of 0.7333 is used in 
the test design for MetaMetrics. Finally, we chose p = 0.1180, which is the 
value estimated by MetaMetrics in their studies [Hanlon et al. (2010)]. 

From dichotomous data obtained from the simulation, the Bayesian ma- 
chinery from Section 3 was used in estimating the model parameters in (2.2) 
and (2.3). Figure 1 shows estimates of the ability trajectory for the 1st, 3rd, 
5th and 9th individuals. The red dots in the figures correspond to the esti- 
mated posterior median of the ability Oi^t at the ith day for the ith person, 
and the red dashed lines give the 2.5% and 97.5% quantile trajectories of 
Oi^ti for t = 1,...,50. The black dots are the real abilities at the tth day 
for the ith person in the simulation. The third trajectory is typical of what 
is expected in terms of increasing ability, and is smoothly handled by the 
Bayesian machinery. The other three trajectories are highly nonmonotonic; 
the Bayesian estimates err in trying to be increasing (as they are designed 
to do), but do adapt to the nonmonotonicity when the evidence becomes 
strong enough. 

One method of evaluating the success of the inferential scheme is to eval- 
uate the percentage of time that the true ability, 0j^i, is contained in the 
95% credible interval of estimated ability for each individual. For the ten 
individuals, these estimated coverages were 100%, 100%, 99%, 99%, 100%, 
100%, 94%, 100%, 100% and 91%, which produce an overall estimated cov- 
erage of 98.3%. Thus, while the inferential method is Bayesian, it seems to 
be yielding sets that have good frequentist coverage. 

— 1/2 —1/2 

To summarize the results for the q's, r^ 's and (5^ s , we compare 
their true values with the corresponding estimated values in Figure 2. In 
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Fig. 1. Estimated and actual ability trajectories of 4 individuals from the simulated data. 



these plots, the black bar represents the 95% credible interval of the poste- 
rior distribution. The blue plus stands for the estimated posterior median 
and the red cross is the true value in the simulation. Moreover, the esti- 
mated posterior median of (j)~^''^ is 0.0315 and its 95% credible interval is 
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Fig. 3. Retrospective estimates of ability trajectories of 4 individuals from the MetaMet- 
rics data. 



[0.0148,0.0484]. Note that the true values of the q's, r- ^ 's, (5. ^ 's and 



1/2, 



-1/2, 



4> are all contained in the 95% credible intervals except Tg 
empirical coverage for these parameters is 96.77%. 



■1/2. 



thus, the 



5. MetaMetrics testbed. In this section we apply the DIR model to the 
testbed MetaMetrics data. A sample of 25 individuals from the data base of 
students in certain elementary schools in Mississippi is considered here; the 
differing characteristics of the students are described in Appendix B. The 
primary focus is study of the goals mentioned in Section 1.2. 

5.1. Retrospective estimation of ability growth. First consider retrospec- 
tive estimation of the reading ability for an individual, utilizing all the data 
recorded for that individual. Figure 3 presents the resulting growth trajecto- 
ries for the 3rd, 12th, 17th and 25th individuals studied. In Figure 3 the red 
dots are the posterior median estimates of each individual ability and the 
red dashes correspond to the 2.5% and 97.5% quantiles of the posterior dis- 
tributions of the abilities, while the green dots correspond to estimates of an 
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Fig. 4. 95% credible intervals of the t^ s, 5^ 's and Ci 's with the MetaMetrics data 
set. 



individual's abilities obtained by solving the equation that the expectation 
of expected score for a person's ability is equivalent to the observed score; 
these can roughly be thought of as the raw test scores put on the same scale 
as the 9i^f The most interesting feature of these growth trajectories is that, 
while indeed there typically does appear to be overall growth in ability, this 
growth need not be monotone. In particular, when there is a large time gap 
between subsequent tests, the ability appears to drop for some individuals. 
One natural explanation is that, during vacations, a student may not read 
and could actually lose ability. Another possible explanation is that the stu- 
dent has become less adept at implementation of GAIT after a long break. 
Figure 4 gives the summaries of the posterior distributions of the standard 

—1/2 

deviations of test random effects, t- 's, the standard deviations of the 

— 1/2 

daily random effects, 6^ 's, and the average growth rates, Cj's, for i = 
1,...,25. Moreover, the estimated posterior median of ^~^'^ is 0.0612 and 
its 95% credible interval is [0.0477,0.0757]. 
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Figures 4(a) and (b) show that the standard deviations of two random 
effects are almost all quite large with 95% credible intervals well separated 
from zero. Recall that these were included in the model to account for a 
possible lack of the local independence; the evidence is thus strong that the 
local independence is, indeed, not tenable for this data and that both types 
of random effects are present. The consistency of the standard deviations 
of the random effects across individuals is somewhat surprising, but lends 
credence to the notion that random effect modeling of the local dependence 
is fruitful. 

5.2. On-line estimation of ability growth. In on-line estimation of read- 
ing ability, essentially the same model is used, but, at each time point, only 
the data up to that time is utilized. Instead of having c/)"^' ^ unknown, how- 
ever, we utilize (/>"^" = 0.0612, the estimate arising from the retrospective 
analysis; (j)"^''^ cannot be effectively estimated in an on-line mode. 

Applying the Bayesian methodology yields on-line posterior median abil- 
ity estimates, as well as the 2.5% and 97.5% quantiles of the posterior distri- 
bution of abilities for the 25 individuals being studied; these are the purple 
dots and and dashed purple lines in Figure 5, shown for the 3rd, 12th, 17th 
and 25th individuals. Again, the green dots show the raw score estimates 
of each individual ability at each time point, and the red dots are the ret- 
rospective estimates discussed earlier. In these figures we also include, as 
blue dots, the ability estimates obtained from the current methodology of 
MetaMetrics, which is a partial Bayesian procedure. 

As expected, the on-line ability estimates are much more variable than the 
retrospective estimates. Sometimes, the on-line estimates seem to be some- 
what more variable than the current MetaMetrics estimates (the blue dots). 
This is because at each online estimation point, the current methodology of 
MetaMetrics uses a very tight prior (arising from the previous data) for the 
student's ability. 

While we do not know the truth here, it is plausible that the retrospective 
red dots are our best guesses as to the true abilities, and we can then judge 
how well the various on-line procedures are doing relative to these best 
guesses. Our on-line estimates are generally closer to these retrospective 
estimates than the current MetaMetrics estimates (the 12th individual being 
the interesting exception). In fact, the average mean squared error of our 
on-line estimates relative to the retrospective estimates is 0.0851, while the 
average mean squared error of the current MetaMetrics estimates is 0.1311. 

If we do view the retrospective estimates (red dots) as surrogates for the 
truth, it is interesting to see how often these fall outside the on-line un- 
certainty bands (purple lines). This happened very rarely; individual 17 in 
Figure 5 was one case in which this sometimes happened. One final obser- 
vation from Figure 5 is that the current MetaMetric estimates usually are 
lower than our on-line estimates of the person's reading ability. 
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Fig. 5. On-line estimates of ability trajectories of 4 individuals from the MetaMetrics 
data. 



6. Conclusions and generalizations. The evidence of violation of the local 
dependence assumption in CAIT situations is generally strong, and use of 
test and daily random effects to model the local dependence seems to be 
necessary and successful. Embedding a dynamic linear model framework for 
an individual's ability trajectory within the logistic IRT structure provides 
a powerful and individually adaptive method for dealing with longitudinal 
testing data. 

The retrospective DIR model analysis seems excellent for assessing ac- 
tual ability trajectories and, hence, is of considerable use in understanding 
population behavior, such as the frequently observed drops in ability after 
a long pause in testing. The on-line DIR analysis provides real-time ability 
estimates for assignments of material at the right difficulty level and other 
possible educational goals. 

A key advantage of the Bayesian framework adopted is that uncertainty 
in all unknowns can be built into the model (e.g., uncertainty in the diffi- 
culty of the random test items) , and uncertainty of the estimates is available 
for all inferences. Also, prior information (e.g., knowledge about ability dis- 
tributions over the population and knowledge that general growth in ability 
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is expected) can be built into the analysis, in a nondogmatic fashion that 
allows the data to overrule the prior. 

Many extensions are possible, such as the already mentioned extension to 
two-parameter and three-parameter IRT models. If one also had data for in- 
dividuals over a period of many years — including years near the maturation 
point in one's reading ability — it would be possible to include individual- 
specific Pi in the model. 

APPENDIX A: THE MCMC COMPUTATION 

The MCMC scheme that will be used to sample from the posterior (3.6) is 
a block Gibbs sampling scheme, utilizing the forward filtering and backward 
sampling algorithm at a key point. Because of the block Gibbs sampling 
scheme, we need only specify the conditional distributions of a block of 
variables given the data and other unknown variables. 

A.l. Sampling Y: Truncated normal distribution sampling. Given 9, 99, 
r] and v, the latent variables {Yi^t,s,l} are sampled from 

Yi,t,s,i ~ J^+{Oi,t - ai^t,s + V>i,t + Vi,t,s,ip~ls^i) if Xi^t,s,i = 1, 

Yi^t,s,i '^ ^f-{0i,t - ai,t,s + Vi,t + Vi,t,s^'^i',t,s,i) if ^i,t,s,i = 0' 
where A/^ means the normal distribution truncated at the left by zero, 
while A/"- is the normal distribution truncated at the right by zero and 
''^i^t s I ~ ^^it sl~^ '^'^- Sampling from truncated normals is fast and easy. 

A.2. Sampling 0: Forward filtering and backward sampling. The la- 
tent ability vector 6i = {6ifi, . . . , 9i,Ti), for each individual, is typically high- 
dimensional with highly correlated coordinates, so sampling of the variables 
would appear to be highly challenging. To overcome this roadblock, the pro- 
posed model is transformed so that 9i could be block sampled — within a 
Gibbs sampling step conditional on the other parameters — by the highly 
efficient forward filtering and backward sampling algorithm. 

To see this, consider (j), c, Y , if, r] and z^ as given (the Gibbs sampling step). 
Define Zi^t^g^i = Yi^t^g^i + ai^t,s — ^i,t — i]i,t,s — P~^ and utilize the formulation 
of the model in (2.4). Then, the (conditional) one-parameter DIR model fits 
the framework of dynamic linear models [West and Harrison (1997)], that is. 

System equation: Aj,* = gi^tK,t-i + Wi^t, 

Observation equation: ^i,t,s,« = Xi,t + ^i,t,s,l, 

where Wi^t ^ ^f{0,^-^Ai,t), ^i,t,s,i ^ ^f{0,i^~ls,i) with V'.i ^ = Auf^^i + cr^. 
As indicated in West and Harrison (1997), the forward filtering and back- 
ward sampling algorithm to block update each vector 9i proceeds as follows. 
Since Aj^o = ^i,o — P'^ and 9i^Q ~ A/'(^g'j, ^g^), the conditional prior for 
Aj_o is Aj^o ^ -^{pg ~ /5~^i ^G )• Define information available on the ith day 
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for the ith person as 

A,t = {gi,q ,(l),Tp,(p,V, C, ^i,<?,l,l , • • • , -^i,<?,5,,5,i^,,,,s,,, }g=l • 
We claim that the posterior distribution of Aj^j is then 

(A.i) ^i,t\D^,t--^fi^^i,t,Vi,t), 

which can be verified by induction as follows. Assume that, on the (t — l)th 
day, the posterior of Aj^j_i, given Di^t-i-, is M{^i^t-i-,yi,t-i)- And it is easy 
to see this assumption is true when t = l. Then, from the system equation, 
it is easy to establish that \i^t\Di^t~i ^ ■^{di,t,Ri,t) is a prior for Aj^j, where 
di,t = 9i,tP'i,t-i and Ri^t = 5j^t^i,i-i + <^~^^i,i- Therefore, we have 

Si.t Ki^t.s 

Pr(A,,i|A,t)ocPr(Ai,t|A,i-i)n 11 P^^l^M.^l^t) 

s=l 1=1 



ex exp 



2 

Vi,t,s,l{'^i,t,s,l — \,t} 



X n n «^p 

U=i 1=1 

Then, at the tth day, the posterior distribution of Xi^t is as (A.I), where ^i^t = 

ViAR^ld,,t + E's=iE^='i" At,s,iz^,t,s,i) and y,,i = (Ef=i Ef='r V'.,*,.,/ + 

The above updating procedure is called forward filtering and after it is 
complete and all quantities, that is, fn^t and Vi^t are saved, we can begin the 
backward sampling of Aj^f For the time t = Ti, we sample Xi^t directly from 
■^(Mi.t^^.t)- As the time from t = (Ti — 1) to 0, at each time we draw Aj^i 
from 

where /ii,j = Hi^tiVi^t^fJ-i^t + (pgi^t+i^^J+iK^t+i) and iJj,* = {(j)gl^_^_^ArJ^^ + 
Vj~^ )~^. This follows from 

Pr(Ai,t|Ai,t+i,A,t) ocPr(Ai,t|A,t)Pr(Ai,t+i|Ai,t, A,t) 



(X exp 



X exp 



2 



^2 
t 



2 

Thus, for t = 0, . . . , Tj, we set Oi^t = ^i,t + P~^ and each vector 6i is sampled 
as a whole block, noticing that 

Pr(0,|A,Tj=Pr(^,,TjA,TjPr(0»,T,-i|^i,T,,A,T-i)---Pr(0»,o|^i,i,A,o). 
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A. 3. Sampling c: Truncated normal distribution sampling. When 9 and 
4> are given, the full conditional distribution of q is the truncated normal 
distribution 



'M+ 






>Er=i(A+(l-p0M-i))2A-i 



A. 4. Sampling ry: Multivariate normal distribution sampling. When 9, 
(/?, T, Y and v are given, if Si^t > 1) then the full conditional distribution of 
r/*j is the multivariate normal distribution 



where Y*^ = {Yi^t,i,i- Oi,t + ai,t,i- fi,t, ■ ■ ■ ,yi,t,i,Kij^i-Oi,t + ai^t,Ki,t,i-V^i,t, ■■■■. 
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l^i,t,Si_t-l 



(E.iJi^M.»)x(SM-i) 



where !/<- is a i^-dimensional column vector with each element being 1 and 
m,t,Si,t = -T.s=i Vi,t,s- When S^ = 1, rii,t,Si,t = 0- 



A. 5. Sampling r: Gamma distribution sampling. When rj is given, the 
full conditional distribution of tj is the gamma distribution 



Tir^Qa 



li,t 



A. 6. Sampling (p: Normal distribution sampling. When 9, i], 5, Y and z/ 

are given, the full conditional distribution of ipi^t is the normal distribution 



'Pi,t' 



'M 



Ylis=l Ylil=l '" '4>i,t,s,l{Xi,t,s,l — 6i,t + ai,t,s — Vi,t,s) 



s=i L.i=i Vi,t,s,i + Oi 
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A. 7. Sampling S: Gamma distribution sampling. When (p is given, the 
full conditional distribution of 5i is the gamma distribution 

^ /T,-i EjLi'plt 

ya' 



A. 8. Sampling </>: Gamma distribution sampling. When 6, c is given, the 
full conditional distribution of (p is the gamma distribution 

^-^\ 2 ' 2 

A.9. Sampling u: Metropolis Hastings sampling. Given y, 0, ^p and ry, 

the full conditional distribution of 1^1^1,3.1 is proportional to 



T^{^i,t,s,l\y,0,ip,ri) oc 



^ + ^.s, 



(Xi,t,s,l — ()i,t + ai.t,s — fi,t — Vi,t.sy 

X exp< 



which is not in closed form. So we shall resort to a Metropolis-Hastings 
scheme to sample this distribution. A suitable proposal for sample v is the 
K-S distribution itself. Thus, we first sample u from the K-S distribution 
whose density is defined in (3.2). Then, we let 

(M\ f ^*' with probability min(l,L/?), 

ht,s,i I yl^~i , otherwise, 
where, given Y , 6, ip and ?/, 

LR 






^ l,t,S,L ' 

and M indicates the Mth iteration step in MCMC. 

A. 10. Implementation. The Gibbs sampling starts at A.l, with initial 
values for ^(o), c^"), </>(°), ¥?(o), r/(o), (5(°), t(°) and 1/(0), and then loops 
through A. 9 until the MCMC has converged. The initial values chosen in 
the apphcations were (9(°) = 0, c(°) = 0, (/-(o) = 1, (^(°) = 0, r/(°) = 0, 5(°) = 1, 
T^^i = 1 and v^'^' = 1, where we used "a" here to indicate that each element 
of the corresponding vector or set has the same value "a" . The convergence 
was evaluated informally by looking at trace plots, and was found to obtain 
at most after 30,000 of 50,000 iterations in the examples. 
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Table 1 
Characteristics of the 25 considered individuals from the MetaMetrics data 





Total tests 


Days 


Max. tests/days 


Range of items/test 


Max. gap 


Gradt 


No. 1 


147 


73 


8 


4-25 


105 


4 


No. 2 


162 


64 


9 


3-17 


102 


2 


No. 3 


118 


77 


4 


3-21 


87 


2 


No. 4 


93 


53 


4 


5-25 


147 


2 


No. 5 


114 


89 


3 


6-25 


109 


2 


No. 6 


157 


57 


29 


4-20 


116 


2 


No. 7 


153 


63 


7 


4-20 


97 


2 


No. 8 


60 


50 


5 


3-24 


168 


6 


No. 9 


135 


53 


7 


4-24 


93 


2 


No. 10 


137 


54 


6 


4-17 


219 


1 


No. 11 


214 


100 


11 


3-18 


108 


2 


No. 12 


113 


76 


4 


4-16 


45 


2 


No. 13 


95 


65 


4 


4-14 


113 


2 


No. 14 


116 


57 


6 


5-17 


107 


2 


No. 15 


155 


71 


9 


4-20 


107 


1 


No. 16 


247 


76 


13 


3-19 


113 


2 


No. 17 


254 


76 


12 


3-18 


107 


2 


No. 18 


304 


53 


31 


3-12 


49 


2 


No. 19 


167 


83 


5 


3-23 


58 


2 


No. 20 


101 


68 


9 


4-23 


117 


2 


No. 21 


88 


58 


9 


3-23 


110 


2 


No. 22 


220 


96 


8 


2-23 


104 


3 


No. 23 


80 


66 


6 


2-25 


93 


6 


No. 24 


105 


60 


6 


6-24 


62 


3 


No. 25 


218 


74 


12 


3-25 


113 


2 



APPENDIX B: CHARACTERISTICS OF 25 STUDIED INDIVIDUALS 

Twenty-five individuals from the MetaMetrics data base are studied in 
detail; the characteristics of the data for these individuals are described in 
Table 1. 

APPENDIX C: POSTERIOR PROPRIETY 

Theorem 1. Suppose n>2 and, for i = l, . . . ,n, Tj > 2 and Si^t ^ 2 for 
at least two days t £ {1, . . . ,Tj} with at least two of the tests on each of the 
two days having at least one and one 1 observation. Then the posterior 
density of the DIR model is proper. 



We first give some needed lemmas that may be of independent interest 
for proving posterior propriety in other logistic modeling scenarios. Proofs 
of these lemmas are given in Appendix A of Wang (2012). 
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Lemma 2. For any three real numbers x, ei and 62, 



gX+ei -^ 



Lemmas. For 6'j g (—00,00), z = 1,2, 

/OO fOO fOO 
/ / ^-l/2g-r(r,2+r,2)g-(|ei+»?i| + |9i-r,i| + |e2+r;2| + |e2-»?2|)^^^^^^^2 

-00 J —00 Jo 

wzi/i some constant K. 

Lemma 4. For ^j G (—00,00), i = 1,2, 

/OO /'OO /'OO TV^ 

/ / j-i/2g-(V2)(^?+^i)g-{|ei+^i|+|e2+^2t) ^5^^^ ^^2 < _(1^^ 

wzt/i some constant K. 



Lemma 5. For T>2, 

z2/2 



00 /"OO /'OO /'OO /'OO 1 -| 



JO JO J -00 J -00 (P^/^ l + \^B{c)/(j)Z + A{c)\'^ 

,'2/ 



X , e ^' /^ dz dz' dc dc ( 

l + \y/B'{c')/(l)z' + A'{c')\ 



< 00, 



where 



A{c)=f,G,ll(^-cpAt) + Y,cAt n (l-cpA+), 

t=l t=l i=t+l 

T T T 

B{c) = Y,^t n {l-cpAtf + 4>VG^ll{l-cpAtf, 

t=l i=t+l t=l 

A'{c')=f,G,ll{l-c'pAt) + J2^'At n (1-CVA+), 

t=l t=l i=t+l 

T T T 

B\c')=Y,^t n (i-cVA+)V(/>yG,n(i-^vA+)^ 

and we have dropped the label i in the subscripts for Aj^t; Q, PG . o.iT'd Vq . ■ 
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Lemma 6. For T>2, 

exp<j )■ dzdc < oo, 



oo roo 1 f 2 



/o J ^oo I + \^/B{c)/4>z + A{c)\ [ 2 J 
with A{c) and B{c) defined in Lemma 5. 

Proof. In proving posterior propriety, it is easiest to work with the 
posterior density without the data augmentation, namely, 

7r(6l,c,T,r/,e,0|X) 

' " 1 / {Oifl-l^G,-?\^ 1 1 1 1 



oc 









nnnnriexp-i^ nnvli-p 



r^a '\ 2ct2 ;j]iii_iV2^ 

(C.I) X nn(s) e.p(-^!^Mi!^) 



=it=i ^ ' 

{n Ti Si^t Ki^t,a w la i i i M 

TT TTTT TT ^'^V{M,t,s,l\Pi.t ~ ^i,t,s + y^i,t + '<li,t,s + ^i,t,s,l)\ 
fj t J W \}^ 1 + exp(0i,t - ai,j,, + <^i,i + r]i^t,s + ej,t,5,z) 

/Arr r^ ( 'AKt-^M-i-c^(i-At-i)A+}2 

nSSv^^'v ^i^ 

Noting that 

^'w[Xi^t,s,i{^i,t - ai,t,s + ^i,t + ??i,t,^ + Si^t,s,i)] ^ . 
1 + exp(6'j,t - aj,t,s + V>i,t + ??j,f,s + ej,i,s,«) 
an upper bound on the posterior density can be found by dropping all terms 
except the and 1 test observations in the assumed tests for each individual. 
Utilizing Lemma 2 for each pair of observations and 1 then results in the 
following upper bound on the posterior density (C.l): 

1 ii\ 1 / (<ko-Kj\ 1 1 1 

( n Ti Si.t K,^t,s .2 \ W " ^i rr' / A-,o2 
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{n Ti 
nn 
i=lt=l 

( n Ti S; 



3.) 

2-K 



("--^)/^ ^ r^Vl'J^H, 



exp 






U=lt=ls=l 



s=l 



(C.2) X I JJexp(-|6'j,i. +(/7j,ii +??j,ii,m| + \aiM,m\ + \£i,ti,m,k\ + \^i,ti,m,k'\) 



.1=1 



X exp(-|6'j,i. +(pi,ti +'ni,ti,m'\ + \ai,ti,m'\ + ki,ti,m',/i| + \£i,ti,m',h' 
X exp(-|6'i^(/ +(/?i_(/ +??i,i' rl + \ai,t'r\ + \^i,t'r,q\ + ki,f' r,<?' I) 



X exp(-|6'j (/ + ipi-1.1 + r?j (/ /| + |aj f ^1 + ki,t',r',gl + ki,t',r'.g' 

«,< — (^i,t~l — Ci{l 



!n Ti 
nn 



27rA, 



i,t 



■exp 



7i,t-i;^i,i 



)Ai}^ 



2A,i 



Ignoring multiplicative constants, and integrating out all the £i^t,s,h (C-2) 
has an upper bound of 



1 



n 



1 



exp 






ko-i^G,f 



'-{c^>0} 3/2 .3/2 



n Ti 



nnvl^-p^ '■"" 



,i=it=i 



27r 



(c-3) X nnte 

Li=ii=i^ ^ 



*/ v-1 — 1 * 



{n Ti S,,t /- Sij-l 

j=lt=ls=l I s=l 



Ylexp{-\ei^t,+^i,U+Vi,ti,m\}ew{-\0i,h+^i,U+Vi,u,m'\} 



.i=l 



X exp{-|6lj^t/ +(^j^j/ +r7j^t/^^|}exp{-|6li,t,, + V'M,' +^i,t;,r'|} 

i,t - Oi^t-i - q(1 - p0i,t-i)A+ }2 



' n T 

nn 

,i=it=i 



27rA 



■exp 



i,t 



2Ai, 
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We only consider here the "least information" case in which Si^n = Si^t-, = 2; 
the more general case can be done similarly. Then r]i^ti,m = —fli,u,m>, 'ni,t^,,r = 
-r?i,t.,,r', exp(-r,r/*'t^S-/^r/*t,/2) = exp(-rir/2t, „,), and exp(-rir/*'t^, x 
^7t-,Vit-,/'^) — ^^V {~'Ti'ni t , r) • Using this in (C.3) and integrating out all 
other rj except for rji^t^^m and i]i,t,,r and all if except for (pi^a and 9?j^t') 
results in the expression 



1 / {Oi,0 - fJ-G,, ) 






3/2 

r • 



X TTI exP {-n{rilt^,m + ^It'r)) 



27r 

X exp{-(|6'i,i^ +fi,U +m,U,m\ + \6i,U +^i,U -??J,t,,m|)} 



X exp{-(|6'j^i^ + ipi^t^^ + r]i^t>^^r\ + \Gi,tr + ^i,tr - Vi,tr,r\)} 



n Ti I 7 . ASa.._a.. , _ ^.n _ ^fl. . .^A+^2 



nn 



■exp 



i,t-^M-l-C^(l-At-l)^^^J' 



27rAi,j " V 2Ai,i 

(^ 1=14=1 V y 

Next integrate out over Tj, 'i]i^ti,m and r/j (/ ^ using Lemma 3, resulting in the 
upper bound (again ignoring multiplicative constants) 



1 (A 1 / io^,o-^^GJ 



\2 



X exp{-(|6ij,t, +ipi^t,\ + \Oi^tr + ^i.t'})} \ 

' ^i,t-^i,i_l-Q(l-p0i,t„l)A+P 



nn 



■exp 



27rAi,j " V 2Ai,t 

(^i=li=l V ' ' / 

Next integrate out 5j, ^pi^n and 92^^/ using Lemma 4. The resulting upper 
bound on (C.4) is 

1 fA 1 ( (^^o-^gJ' Ax 1 1 

n:7^^^7-«^p — ^^z^ — i{^.>o} 



<A3/2\jLiV2^yc, "V 2^0, r^^'^^^ 1 + 



i,t'A 
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TV TV I V I ^l--i,t-Oi,t-l-Ci{l- p9i^t~l)^itY 



. i=lt=l V ' ' 



Integrating out all the Oi^t except the Oii' results in the expression 
1 



;^|ni{e.>o}-Y^^| 
' " / — - — 

n 



X 



, «=1 V ■'» 



(C.5) 



, ' \t=l i=t+l 



1/2 



t=l 



t=i 

X exp ( - U ( ^i,t^ - IJ'G,^ \{{l - CipAf^) 

-E^At* 11(1 -^^/^^ 



f=l i=f+l / / 

/ 2 E A,i n (1 - ^*^^t*)' + '^^G., 11(1 - c,pAt^ 

I \ \t=\ i=t+\ t=i 

Finally, defining 



VWcd 
using Lemma 6 to integrate out all ^j +/ and q, except for two individuals, 

' i 

and then using Lemma 5 for the remaining variables of (C.5), it follows that 
the integral is finite, completing the proof. D 
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